22 research outputs found

    Dotare il sardo di dati normativi su età d’acquisizione, familiarità e accordo sul concetto: Uno studio preliminare con 50 figure di Snodgrass & Vanderwart (1980)

    Get PDF
    In the present work, some normative data specifically relating to the Sardinian language were obtained on a set of 50 pictures taken from the famous study by Snodgrass & Vanderwart (1980). The parameters on which these normative data were obtained are some of the most studied in the literature: Age of Acquisition (AoA), Familiarity (FAM), and Concept Agreement (CA). 106 Sardinian native speakers took part in the experiment, carried out completely in written form via an online platform. In addition to providing, for each of the 50 images, normative data on the aforementioned parameters, this work found that AoA and FAM are strongly negatively correlated indicators; a correlation was also observed between both these parameters and the Concept Agreement measure, although these correlations are decidedly more moderate. A comparison was also made between the results of this work and those of two studies that collected normative data for Italian on the same parameters: Nisi et al. (2000) and Dell’Acqua et al. (2000). It was observed that Sardinian participants judged the depicted objects as significantly more familiar, and they claimed that they had learned the words denoting those objects significantly earlier. As for the CA, on the other hand, the data on Italian show a significantly higher percentage on average. However, while for AoA and FAM a strong positive correlation was found between the data on Italian and those on Sardinian, the data on these two languages are clearly uncorrelated for CA, suggesting that the degree of ease in finding a valid name for a picture is dictated by different factors in a national language such as Italian compared to a local language such as Sardinian. More generally, this shows that, before carrying out picture-naming tasks in a given language, it is advisable to have specific normative data for that language, even if it is a minority language or a dialect

    Dotare il sardo di dati normativi su età d’acquisizione, familiarità e accordo sul concetto: Uno studio preliminare con 50 figure di Snodgrass & Vanderwart (1980)

    Get PDF
    In the present work, some normative data specifically relating to the Sardinian language were obtained on a set of 50 pictures taken from the famous study by Snodgrass & Vanderwart (1980). The parameters on which these normative data were obtained are some of the most studied in the literature: Age of Acquisition (AoA), Familiarity (FAM), and Concept Agreement (CA). 106 Sardinian native speakers took part in the experiment, carried out completely in written form via an online platform. In addition to providing, for each of the 50 images, normative data on the aforementioned parameters, this work found that AoA and FAM are strongly negatively correlated indicators; a correlation was also observed between both these parameters and the Concept Agreement measure, although these correlations are decidedly more moderate. A comparison was also made between the results of this work and those of two studies that collected normative data for Italian on the same parameters: Nisi et al. (2000) and Dell’Acqua et al. (2000). It was observed that Sardinian participants judged the depicted objects as significantly more familiar, and they claimed that they had learned the words denoting those objects significantly earlier. As for the CA, on the other hand, the data on Italian show a significantly higher percentage on average. However, while for AoA and FAM a strong positive correlation was found between the data on Italian and those on Sardinian, the data on these two languages are clearly uncorrelated for CA, suggesting that the degree of ease in finding a valid name for a picture is dictated by different factors in a national language such as Italian compared to a local language such as Sardinian. More generally, this shows that, before carrying out picture-naming tasks in a given language, it is advisable to have specific normative data for that language, even if it is a minority language or a dialect

    CAPISCO @ CONcreTEXT 2020: (Un)supervised Systems to Contextualize Concreteness with Norming Data

    Get PDF
    This paper describes several approaches to the automatic rating of the concreteness of concepts in context, to approach the EVALITA 2020 “CONcreTEXT” task. Our systems focus on the interplay between words and their surrounding context by (i) exploiting annotated resources, (ii) using BERT masking to find potential substitutes of the target in specific contexts and measuring their average similarity with concrete and abstract centroids, and (iii) automatically generating labelled datasets to fine tune transformer models for regression. All the approaches have been tested both on English and Italian data. Both the best systems for each language ranked second in the task

    DANKMEMES @ EVALITA 2020: The Memeing of Life: Memes, Multimodality and Politics

    Get PDF
    DANKMEMES is a shared task proposed for the 2020 EVALITA campaign, focusing on the automatic classification of Internet memes. Providing a corpus of 2.361 memes on the 2019 Italian Government Crisis, DANKMEMES features three tasks: A) Meme Detection, B) Hate Speech Identification, and C) Event Clustering. Overall, 5 groups took part in the first task, 2 in the second and 1 in the third. The best system was proposed by the UniTor group and achieved a F1 score of 0.8501 for task A, 0.8235 for task B and 0.2657 for task C. In this report, we describe how the task was set up, we report the system results and we discuss them

    EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

    Get PDF
    Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

    STaRS.sys: designing and building a commonsense-knowledge enriched wordnet for therapeutic purposes

    Get PDF
    This thesis investigates the possibility to exploit human language resources and knowledge extraction techniques to build STaRS.sys, a software system designed to support therapists in the rehabilitation of Italian anomic patients. After an introductory section reviewing classification, assessment, and remediation methods for naming disorders, we analyze the current trends in the exploitation of computers for the rehabilitation of language disorders. Starting from an analysis of the needs of speech therapists in their daily work with aphasic patients, the requirements for the STaRS.sys application are defined, and a number of possible uses identified. To be able to implement these functionalities, STaRS.sys needs to be based on a lexical knowledge base encoding, in a explicit and computationally tractable way, at least the kind of semantic knowledge contained in the so called feature norms. As a backbone for the development of this semantic resource we chose to exploit the Italian MultiWordNet lexicon derived from the original Princeton WordNet. We show that the WordNet model is relatively well suited for our needs, but that an extension of its semantic model is nevertheless needed. Starting from the assumption that the kinds composing the feature types classifications exploited for encoding feature norms can be mapped onto semantic relations in a WordNet-like semantic network, we identified a set of 25 semantic relations that can cover all the information contained in these datasets. To demonstrate the feasibility of our proposal, we first asked to a group of therapists to use our feature types classification for classifying a set of 300 features. The analysis of the inter-coder agreement shows that the proposed classification can be used in a reliable way by speech therapists. Subsequently, we collected a new set of Italian feature norms for 50 concrete concepts and analyze the issues raised by the attempt to encode them into a version of MultiWordNet extended to include the new set of relations. This analysis shows that, in addition to extending the relation set, a number of further modifications are needed, for instance to be able to encode negation, quantifications or the strength of a relation. Information that, we will show, isn't well represented in the existing feature norms either. After defining an extended version of MultiWordNet (sMWN), suitable to encode the information contained in feature norms, we deal with the issue of automatic extraction of such semantic information from corpora. We applied to an Italian a corpus state of the art machine-learning-based method for the extraction of common-sense conceptual knowledge from corpora, previously applied to English. We tried a number of modifications and extensions of the original algorithm, with the aim of improving its accuracy. Results and limitations are presented and analyzed, and possible future improvement discussed

    Dotare il sardo di dati normativi su età d’acquisizione, familiarità e accordo sul concetto: Uno studio preliminare con 50 figure di Snodgrass & Vanderwart (1980)

    No full text
    In the present work, some normative data specifically relating to the Sardinian language were obtained on a set of 50 pictures taken from the famous study by Snodgrass & Vanderwart (1980). The parameters on which these normative data were obtained are some of the most studied in the literature: Age of Acquisition (AoA), Familiarity (FAM), and Concept Agreement (CA). 106 Sardinian native speakers took part in the experiment, carried out completely in written form via an online platform. In addition to providing, for each of the 50 images, normative data on the aforementioned parameters, this work found that AoA and FAM are strongly negatively correlated indicators; a correlation was also observed between both these parameters and the Concept Agreement measure, although these correlations are decidedly more moderate. A comparison was also made between the results of this work and those of two studies that collected normative data for Italian on the same parameters: Nisi et al. (2000) and Dell’Acqua et al. (2000). It was observed that Sardinian participants judged the depicted objects as significantly more familiar, and they claimed that they had learned the words denoting those objects significantly earlier. As for the CA, on the other hand, the data on Italian show a significantly higher percentage on average. However, while for AoA and FAM a strong positive correlation was found between the data on Italian and those on Sardinian, the data on these two languages are clearly uncorrelated for CA, suggesting that the degree of ease in finding a valid name for a picture is dictated by different factors in a national language such as Italian compared to a local language such as Sardinian. More generally, this shows that, before carrying out picture-naming tasks in a given language, it is advisable to have specific normative data for that language, even if it is a minority language or a dialect

    Investigating Dowty’s proto-roles with embeddings

    No full text
    Distributional semantics represents words as multidimensional vectors recording their statistical distribution in context. Notwithstanding the wide use of this approach in fields as distant as Natural Language Processing, psycho-linguistic modeling and semantic analysis, relatively little work focused on the characterization of the semantic information encoded in these semantic vectors, especially for verbs. Here we investigate whether and to what extent distributional vectors are able to encode the semantic content of Dowty’s semantic proto-roles, which can be characterized as the set of entailment relations that an argument receives by virtue of its role in the event described by a predicate (Dowty 1989, 1991). We created several linear mappings between various kinds of static embeddings and a semantic space built on the basis of the proto-roles annotations collected by White et al. (2016). Our results show that, to a certain extent, proto-roles information is available in distributional models, and that a linear mapping can be used to infer the semantic characteristics of the arguments of novel verbs, thus testing the possibility of developing large-scale models able to extract the semantic properties for a wide inventory of verbs. Finally, we report a qualitative analysis in which we discuss which entailment relations our technique associates with a few semantic verb classes whose semantic roles are notoriously difficult to describe
    corecore